Client Report - Famous Names

Unit 1 Task 3

Author

William Hardy

Show the code
import pandas as pd
import numpy as np
from lets_plot import *

LetsPlot.setup_html(isolated_frame=True)
Show the code
# Learn morea about Code Cells: https://quarto.org/docs/reference/cells/cells-jupyter.html

# Include and execute your code here
df = pd.read_csv("https://github.com/byuidatascience/data4names/raw/master/data-raw/names_year/names_year.csv")

QUESTION 1

Mary, Martha, Peter, and Paul are all Christian names. From 1920 - 2000, compare the name usage of each of the four names in a single chart. What trends do you notice? You must provide a chart. The years labels on your charts should not include a comma.

There are a couple of things that i noticed with this chart. Martha, Paul & Peter have been used not as much as Mary but stayed very consistnent through the years, while Mary has a high in between 1945 and 1954, then drastically tank

Show the code
# Q1
# Step 1: Sum across all state columns for total usage per row
df["Total_States"] = df.drop(columns=["name", "year", "Total"]).sum(axis=1)

# Step 2: Filter for the names and years
q1_names = ["Mary", "Martha", "Peter", "Paul"]
q1_df = df[(df["name"].isin(q1_names)) & (df["year"].between(1920, 2000))]

# Step 3: Plot the total count over time
ggplot(q1_df, aes(x="year", y="Total_States", color="name")) + \
    geom_line() + \
    ggtitle("Comparing the Names: Mary, Martha, Peter, and Paul (1920–2000)") + \
    xlab("Year") + \
    ylab("Total Count (All States Combined)") + \
    scale_x_continuous(breaks=list(range(1920, 2001, 10)), format = "d")

QUESTION 2

  1. Think of a unique name from a famous movie. Plot the usage of that name and see how changes line up with the movie release. Does it look like the movie had an effect on usage? You must provide a chart. The years labels on your charts should not include a comma.

tThe movie Frozen shows that there is indeed of the usage of Elsa. in 2013 its roughly orver 2000 people, and then the name doubles close to 4,500 people.

Show the code
# Q2
# Step 1: Sum across states
df["Total_States"] = df.drop(columns=["name", "year", "Total"]).sum(axis=1)

# Step 2: Filter for Elsa
q2_df = df[(df["name"] == "Elsa") & (df["year"].between(2000, 2020))]

# Step 3: Plot it!
ggplot(q2_df, aes(x="year", y="Total_States")) + \
    geom_line(color="#800080") + \
    geom_vline(xintercept=2013, linetype="dashed", color="red") + \
    ggtitle("Popularity of 'Elsa' Over Time (2000–2020)") + \
    xlab("Year") + \
    ylab("Total Count") + \
    scale_x_continuous(breaks=list(range(2000, 2021, 2)), format = "d")